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Abstract 

For  a  general  stationary  and  invertible  ARMA  (p,q)  process,  we  show  how  to  carry 
out  a  fully  Bayesian  analysis.  Our  approach  is  through  the  use  of  sampling  based  methods 
involving  three  novel  aspects.  First  the  constraints  on  the  parameter  space  arising  from 
the  stationarity  and  invertibility  conditions  are  handled  by  a  convenient  reparametri2ation 
to  all  of  Euclidean  (p4-q)-space.  Second,  required  sampling  is  facilitated  by  the 
introduction  of  latent  variables  which,  though  increasing  the  dimensionality  of  the 
problem,  greatly  simplifies  the  evaluation  of  the  likelihood.  Third,  the  particular  sampling 
based  approach  used  is  a  Markov  chain  Monte  Carlo  method  which  is  a  hybrid  of  the  Gibbs 
sampler  and  the  Metropolis  algorithm.  We  also  briefly  show  how  straightfcrwardly  the 
sampling  based  approach  accommodates  missing  observations,  outlier  detection,  prediction 
and  model  determination.  Finally  we  illustrate  the  approach  with  two  examples. 

Keywords:  Gibbs  sampler,  invertibility,  latent  variables.  Metropolis  algorithm,  missing 
values,  outliers,  prediction,  stationarity. 
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1.  Introduction 

Autoregressive  moving  average  models  constitute  a  broad  class  of  parsimonious  time 
series  processes  which  are  useful  in  describing  a  wide  variety  of  time  series.  In  particular, 
the  process  {z^}  is  said  to  be  an  autoregressive  moving  average  process  of  order  p,  q, 
ARMA  (p,  q),  with  mean  /i,  if  it  is  generated  by 

=  ^B)c^  (1) 

where  B  is  the  backward  shift  operator,  ^  Gaussian  white  noise  process 

with  variance  ^(B)  =  1  -  B  -  •  •  •  -  and  ^B)  =  1  -  B  -  •  •  •  -  ^^B'^  are 
polynomials  in  B  of  degrees  p  and  q  respectively. 

For  the  process  to  be  stationary  the  roots  of  ^(B)  =  0  must  lie  outside  the  unit 
circle  and  if  the  roots  of  ^B)  =  0  also  lie  outside  the  unit  circle  the  process  is  said  to  be 
invertible,  in  which  case  there  is  a  unique  model  corresponding  to  the  likelihood  function. 
It  may  be  argued  (Box  and  Jenkins,  1976)  that,  in  forecasting,  a  nonstationary  process  is 
undesirable  and  a  noninvertible  process  is  meaningless. 

If  the  stationarity  and  invertibility  conditions  are  to  be  satisfied,  the  parameter 
vectors,  ^  =  (^j  ,•  •  •  and  $  =  ,•  •  •  ,^q),  are  constrained  to  lie  in  regions  Cp  and  C^, 

respectively,  corresponding  to  the  polynomial  operator  root  conditions.  Allowable  values  of 
(^,  ff)  then  lie  in  Cp»Cq,  the  forms  of  which  are  simple  to  identify  for  p<2,  q<2.  However 
for  k>2  the  form  of  becomes  complicated  and  for  k>4  the  polynomial  equations  ^(B)  = 
0  and  ^B)  =  0  cannot,  in  general,  be  solved  analytically. 

In  a  Bayesian  setting,  for  a  stationary  invertible  ARMA  (p,q)  time  series  model  of 
the  form  (1),  the  region  CpxC^  determines  the  ranges  of  integration  for  obtaining  joint  and 
marginal  distributions  of  the  parameters  and  for  evaluating  posterior  expected  values. 
Historically,  Bayesian  analysis  of  these  models  ignores  this  region  in  order  to  obtaiu 
convenient  distributional  results  for  the  posterior  densities.  See,  e.g.,  Zellner  (1971),  Box 
and  Jenkins  (1976),  Broemeling  (1985),  Broemeling  and  Shaaraway  (1988).  Only  Monahan 
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(1983)  and  Marriott  and  Smith  (1992)  have  restricted  the  analysis  to  Cp>«Cq  employing 
analytic  numerical  integration  techniques.  Monahan’s  calculations  used  fixed  point 
quadrature  rules  for  p+q<2  while  Marriott  and  Smith  extended  these  results  using  the 
Gauss— Hermite  rules  described  in  Smith  et.  al.  (1987).  Generally,  to  accurately  perform 
such  numerical  integration  requires  specialist  software  and  expertise  and  typically,  for  a 
model  with  unknown  /i  and  ^  becomes  infeasible  for  p+qH. 

We  propose  the  use  of  sampling  based  methods  to  obtain  desired  marginal 
posteriors.  In  particular,  the  introduction  of  the  Gibbs  sampler  as  a  tool  for  Bayesian 
calculations  in  Gelfand  and  Smith  (1990)  has  spurred  considerable  interest  in  Markov  chain 
Monte  Carlo  methods.  We  use  a  variant  of  the  Gibbs  sampler  described  in  Muller  (1991). 
Recently  Chib  (1991)  and  Chib  and  Greenberg  (1992)  have  utilized  the  Gibbs  sampler  for 
Bayesian  analysis  of  AR  and  MA  processes.  Their  work  differs  from  ours  in  several  ways. 
Most  notably,  in  the  first  of  these  papers  the  conditional  likelihood  is  used  while  in  the 
second  a  computationally  expensive  form  of  the  unconditional  likelihood  is  used.  In  neither 
paper  is  the  full  ARMA  model  handled.  Proposed  sampling  from  complete  conditionals  for 
^  and  B  requires  a  "double  rejection"  method.  In  our  early  work  we  abandoned  such 
sampling  as  terribly  inefficient  for  larger  p  and  q.  We  also  note  that  McCulloch  and  Tsay 
(1991)  have  employed  the  Gibbs  sampler  for  a  Bayesian  analysis  of  autoregressive  processes 
again  ignoring  stationarity  restrictions.  Carlin,  Poison  and  Stoffer  (1992)  use  the  Gibbs 
sampler  for  Bayesian  analysis  of  first  order  dynamic  models  also  without  concern  for 
stationarity. 

The  outline  of  the  paper  then  is  as  follows.  In  Section  2  we  give  the  ARMA 
likelihoods  in  a  form  well  suited  for  our  Monte  Carlo  method.  In  Section  3  we  obtain  the 
complete  conditional  distributions  required  for  the  Gibbs  sampler.  Section  4  provides  a 
useful  transformation  to  enable  us  to  handle  the  stationarity  and  invertibility  constraints. 
Section  5  describes  a  modified  Gibbs  sampler  which  works  well  for  time  series  models  with 
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Gaussian  error.  Section  6,  7  and  8  briefly  deal  with  the  issues  of  missing  data,  prediction 
and  model  determination.  Finally  Section  9  analyzes  two  interesting  data  sets. 

2.  Likelihood  forma 

In  Section  2.1  we  note  that  the  likelihood  for  the  general  ARM  A  (p,q)  model  can  be 
written  in  a  relatively  simple  form  by  conditioning  upon  the  unobserved  history  of  the 
process.  This  form  is  inexpensive  to  evaluate  making  it  attractive  for  use  with  the  Gibbs 
sampler.  In  Section  2.2  we  observe  that  for  the  pure  autoregressive  process,  AR(p),  we  can 
marginalize  over  this  history  resulting  in  a  lower  dimensional  likelihood  which  can  still  be 
evaluated  cheaply. 

2.1  The  general  ARMA  (p,q)  likelihood 

For  the  general  ARMA  (p,q)  model  in  (1)  the  form  of  the  exact  likelihood  is  known 
(Newbold,  1974).  While  in  theory  it  is  possible  to  work  with  this  form,  its  evaluation, 
unless  the  sample  size  is  very  small,  is  computationally  very  expensive  rendering  it 
infeasible  for  use  vnth  a  Monte  Carlo  approach.  A  form  for  the  likelihood  which  is  well 
suited  for  sampling  based  techniques  can  be  developed  by  introducing  latent  variables 
(parameters).  Dimensionality  of  the  parameter  space  is  increased  in  exchange  for  a 
manageable  likelihood. 

Letting  y^  =  -  /i,  (1)  can  be  written  as 

yt  =  ^  yfi  -  ^  +  S-  (2) 

j=l  ^  ‘  J  j=l  J  ‘  ‘ 

Hence  we  denote  the  likelihood  for  n  observations  *  =  (zj ,  •  •  • ,  z^  )  by  f(z;  ip)  where 
0,  fi,  <T^)  with  <p  =  )  and  We  introduce  as  latent 

variables  the  unobserved  history  =  (y^,  y  j,  •••,  y^  ^)  and  Cq  =  (e^,  •••,  e^.^) 


5 


resulting  in  the  augmented  parameter  vector  ^  =  (y^,  0,  n,  o^)  of  dimension 

2(p4-q+l).  The  conditional  likelihood  is  then  obtained  &om  the  factorization 
f(z|^)  =  f(zi  IV^)  f(z2|zj,  ...  f(zjzj, 


where 


=  (2irir^)'7exp{~l  E  (Yt-Mif} 


t=l 


i=l 


i=l 


(3) 


t-1 


=  .1^  <t>i  yfi  -.1^  Vi  for t  =  2,  . ■  q 


and 


Xf  i  ^i (y^  i  -  V- for  t  =  q+1,  •  •  • . 


2.2  The  general  AR(p)  likdihood 

In  the  special  case  of  an  AR(p)  process  (^B)  =  1)  latent  variables  are  not  needed. 
We  have  ^  =  (^,  /r,  <7^)  and  the  exact  likelihood  can  be  factored,  analogous  to  (3),  as 


f(z|t&)  =  {2wy^ 


n  at 

t=l  ‘ 


^exp|-^^E^(y^-/ijV<r5}- 


(4) 


In  (4),  /X,  =  0,  a\  =  E  <!>]%+  2  S  ;+^r^  while  for  t  =  2,-  -.p 

j=l  l<i<j<P  ^ 

1  9  P  9 

^  =  S  ^j7o  +  2  E  <f>i<l>ir.  +  and  for  t  =  p+l,---,n, 

j=l  '  *  J  j=t  t<i<j<p  ^  ^  * 

p 

Here  7^,  k  =  0,  1,  2,  •  •  •  is  the  autocovariance  of  the  AR(p)  process  of  lag  k.  Using 
7o  =  . ^pPp),  we  can  write 
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2  C 


‘~0p  Pp 


£  ^ j  +  2  E  . 

j=l  i<j  ‘  J  • 


+  a" 


and  for  t  =  2,-  •  -  jP 


^2  _  g 

‘  1-01  Pi - <Pp  Pp 


E  0]  +  2  E  0i0iP.  . 
j=t  i<t<j  ^  J  ’ 


+  a 


where  P;  =  —  ,  so  that,  for  all  t,  a\  is  of  the  form  a^c, .  The  p.  's  can  be  obtained  in  terms 


of  the  ’s  using  McLeod’s  (1975,  1977)  algorithm. 


3.  Complete  conditional  distributions 

Given  a  prior  distribution  on  0^,  ^0*)  the  posterior  density  for  0* 

jr(0*|z)  «  f(*|0*)  .  ir(0*)  (5) 

Bayesian  inference  proceeds  by  obtaining  marginal  posterior  distributions  of  the 
components  of  0^  as  well  as  features  of  these  distributions. 

The  Gibbs  sampler  introduced  by  Gelfand  and  Smith  (1990)  as  a  tool  for  carrying 
out  Bayesian  calculations  is  a  Markovian  updating  scheme  which  requires  sampling  from 
the  complete  conditional  distributions  associated  with  0  (see  e.g.,  Gelfand  and  Smith  or 
Gelfand  et.  al.,  1990,  for  details).  A  key  point  is  that  each  complete  conditional  density  is 
also  proportional  to  the  right  side  of  (5).  In  certain  cases,  we  may  recognize  this  form  as 
that  of  a  standard  distribution.  In  more  challenging  cases,  it  emerges  only  as  a  non 
standard,  non  normalized  density.  We  shall  see  that,  for  the  Bayesian  models  in  (5),  the 
complete  conditional  distributions  for  /x  and  illustrate  the  former  case  with  0  and  0 
illustrating  the  latter. 

In  the  general  ARMA  (p,q)  case  it  is  tempting  to  consider  a  noninformative  prior 
specification  for  0*,  ?r(0^)  a  Unfortunately  such  specification  yields  posterior 

distributions  for  y^  and  Cq  which  are  improper.  (This  may  be  readily  seen  from 
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straightforward  calculations  for,  say,  an  ARMA  (1,0)  model).  Rather,  since  the  y^  and  Cq 
are,  in  fact,  like  y^  and  respectively,  their  prior  spedfication  should  be  a  proper 
distribution  given  Hence  the  prior  for  takes  the  form 

7r(^)  =  ir(yo,  fo|t&)  •  <1^) 

where  we  can  assume  a  noninformative  spedfication  for  7r(^),  i.e.,  ir(^)  a  a  . 

Indeed,  Newbold  (1974,  p.  424)  presents  the  choice  of  ir(yQ,  CqUi)  which,  upon 
integration  over  and  yields  the  previously  mentioned  exact  likelihood.  It  is  a 
multivariate  normal  with  mean  0  and  covariance  matrix  A  arising  &om  the  stationary 
ARMA  process.  Working  with  the  (p+q)«{p+q)  matrix  A  presents  the  same 
computational  problem  as  working  with  the  exact  likelihood.  We  make  a  simplification 
which,  on  both  intuitive  and  empirical  grounds,  little  affects  inference  about  if  or  forecasts. 


We  replace  A  with  A  = 


0  aX 


where  is  the  variance  of  the  stationary  ARMA 


(p,q)  process  and  is  the  assumed  error  variance.  Thus,  the  joint  posterior  density  for  ^ 


ic{f^\z)  a  (a^)-^^exp{--i  S  (y^-/ij^}  <7^,,  cj ^) 


2a  t=l 

with  /Xj  defined  below  (3). 

With  a  little  manipulation,  we  may  show  that  the  complete  conditional  distribution 

1  ”  a^  2 

for  fi  is  normal  with  mean  j  1  and  variance  ^  and  that  for  cr  it  is  inverse 

t“l 

Gamma,  i.e.,  IG(j,  j  E  (yt'A^t)^)-  ^  constant  mean  fx  for  the  z’s  need  not 

be  assumed.  Rather,  for  observation  we  could  replace  n  with  x  /?  where  x  is  a  vector  of 
covariates.  No  complications  to  the  sampling— based  approach  result;  the  normal  complete 
conditional  distribution  for  fx  is  replaced  by  a  multivariate  normal  complete  conditional 
distribution  for  The  complete  conditional  densities  for  the  ’s  and  9-^  ’s  are  proportional 
to  (6)  and  must  be  sampled  subject  to  the  restriction  to  Cp-C^.  We  develop  an  efficient 
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sampling  procedure  in  Section  5.  The  complete  conditional  distributions  of  the  latent 
variables  e  .  and  y  .  can  be  shown  to  all  be  normal.  However  calculation  of  means  and 

i-j  •'i-j 

variances  requires  complicated  bookkeeping  as  the  expressions  involve  summations  over 
t  =  1,  2,  •  •  •,  n  with  individual  terms  having  forms  which  change  according  to  the  values  of 
t,  j,  p  and  q.  To  ameliorate  the  programming  burden  we  have  found  it  easier  to  again  just 
work  with  (6). 

“2  2 

In  the  purely  autoregressive  case,  AR(p),  the  noninformative  prior  /i,  a  )  aa 
can  be  used.  From  (4)  we  can  see  that  fi  is  conditionally  normal  with  mean 


“  1 

E  — 

f  S  1  1 

E  — - —  and  variance 

s  -4- 

t=l  ffl 

t=l 

_t=l  al  . 

and  that 


which  is  an  inverse  Gamma  density,  where  the  c^  are  defined  below  (4).  However,  as  in  the 
general  ARMA  case,  sampling  the  subject  to  the  stationarity  restrictions  is  not  routine. 

4.  A  useful  repaiametrization 

Consider,  first,  the  pure  autoregressive  model.  The  constrained  parameter  region 
for  0,  Cp  ,  is  analytically  intractable  for  p>4.  To  complicate  matters  further,  the  Gibbs 
sampler  requires  cross-sections  of  Cp ,  that  is,  sets  for  given  ,  i^j.  To  circumvent  the 
problem  of  dealing  with  these  parameter  constraints,  we  consider  successive 
transformations  from  Cp  to  a  p-dimensional  hypercube  and  then  to  R^.  In  particular, 
Barndorif— Nielsen  and  Schou  (1973)  reparametrize  ^  in  terms  of  the  partial 
autocorrelations  r.  of  the  AR(p)  process.  This  transformation  which  is  one-to-one  is 
defined  by 
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(Pk  -  Tk 

,(k)  _  ^(k-1)  ^  ,  ,  , 

9i  =  fi  “^k^k-i 


where  ^  =  (^i^V  •  being  the  coefficient  from  an  AR(p)  process  and 

^  =  (^1 )  ■  ■ '  i^p  )•  Monahan  (1984)  reports  the  inverse  transformation  in  the  iterative  form 


and  the  Jacobian 

[p/2] 

J  =  j  (i-r5)Kk-»/21  n  (1-.,,)  (9) 

k=l  j=l 

The  condition  that  ^  e  Cp  now  becomes  |rkl<l.  We,  then  apply  a  second 

transformation  from  r  to  r*  €  R*’.  Based  cn  the  experience  of  other  authors,  for  example, 

Smith  et.  al.  (1987),  we  use  the  "Fisher— type”  transformation  discussed  in  Marriott  and 
Smith  (1992)  which  performs  best  for  densities  with  "normal  shapes"  or  mixtures  of  such 


shapes: 


We  would  do  all  of  the  random  generation  in  the  space  of  the  r^’s  inverting  the  r*’s  back  to 
0’s  at  the  end. 


fl+r^' 

r*  =  iog -  j=l,*--,p. 


The  complete  conditional  density  for  r*  arises  by  transforming  the  nonnormalized 
joint  density  (5)  from  (0,  /i,  a^)  to  (r*,  /x,  <P)  and  considering  the  resulting  expression  as  a 
fimction  of  r|  with  r’J,  i^j,  /x  and  fixed.  We  denote  this  nonnormalized  complete 
conditional  form  by  f(r^|r’J,  i^j,  fi,  a^). 

For  the  general  ARMA  model,  assuming  invertibility  and  statiorarity,  we  must,  in 
addition,  generate  0. ’s  from  their  complete  conditional  distributions,  restricting  0  to  C^. 
Monahan  (1984)  has  shown  that  the  above  two-stage  transformation  can  also  be  used  for 
the  moving  average  parameters.  We  merely  replace  0  with  0  and  p  with  q.  Applying  the 
transformation  to  both  0  and  0  results  in  a  transformation  from  (0,^  e  CpxC^  to  say 

('J.  'S)  6  R"*’- 
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5.  Efficient  sampling;  a  modified  Gibbs  sampler 

The  standard  Gibbs  sampler  (see  Section  3)  proceeds  by  making  draws  from  the 
complete  conditional  distributions  in  some  systematic  order.  When  these  distributions  are 
familiar  forms  sampling  is  routine.  When  this  is  not  so  (and  when  the  complete  conditional 
density  does  not  have  special  properties  such  as  log  concavity)  a  variety  of  general 
techniques  have  been  proposed.  These  include  approximate  c.d.f.  inversion  (as  in  Tanner, 
1991),  ratio-of-uniforms  (as  in  Wakefield,  Gelfand  and  Smith,  1991)  and  adaptive 
mixtures  (as  in  West,  1992). 

An  alternative  which  we  employ  here,  for  each  such  nonroutine  draw,  runs  a  scalar 
valued  Markov  chain  Monte  Carlo  algorithm  whose  stationary  distribution  is  the  desired 
complete  conditional.  Muller  (1991)  proves  that,  under  mild  conditions,  use  of  such 
univariate  trajectories  within  a  trajectory  of  the  Gibbs  sampler  results  in  a  Markov  chain 
whose  stationary  distribution  is  the  desired  joint  posterior  distribution. 

More  precisely,  in  our  present  case  suppose  we  use  Ansley’s  (1979)  algorithm  to 
obtain  the  MLE  for  ^  and  0  say  (^,  0)  with  associated  asymptotic  covariance  matrix 
^(^  ‘ff)‘  transformation  of  Section  4  we  convert  (^,  9 )  to,  say,  (r^,  and,  via 

the  delta  method  obtain  an  approximation  to  the  covariance  matrix  of  (r^,  which  we 

denote  by  E*.  Let  gp^q(r^,  rp  be  the  multivariate  normal  distribution  over  having 

mean  (r^,  and  covariance  matrix  E*.  For  any  coordinate  rj  (from  either  or  let 

g(rj  |rf.j,)  denote  its  univariate  conditional  normal  distribution  given  all  of  the  other  r*’s 
derived  from  g 

Op+q 

We  now  take  g(rj|r^.j,)  to  be  a  Gaussian  proposal  for  a  univariate  Metropolis 
algorithm.  That  is,  we  create  the  following  Markov  chain.  If  the  current  value  of  r *  =  u 
and  a  draw  from  g(rJ|r^.j,)  yields  v,  we  calculate  the  ratio 
a(u,v)  =  f(v|r^,  i^j,  /x,  ff^)/f(u|r’5,  i^j,  /x,  a^).  If  q>1  we  move  to  v;  if  o<l  we  move  to  v 


11 


with  probability  a.  It  may  be  straight  forwardly  demonstrated  that  the  stationary 
distribution  of  this  Markov  chain  is  the  normalized  density  associated  with 
f(r|  I  r^,  i^j, /i,  cr^)  (see,  e.g.,  Hastings,  1970  or  Tierney,  1991).  An  iteration  of  this 
modified  Gibbs  sampler  is  thus  implemented  by,  starting  with  r’^,  running  the  associated 
Markov  chain  for  m  steps,  taking  the  state  of  r*  at  the  m^**  step  as  the  updated  value  and 
then  proceeding  to  r^,  etc.  After  all  the  rj  have  been  updated  we  draw  /x,  then  a^,  then  yg 
and  then  Cq  to  complete  one  iteration. 

This  hybrid  algorithm  is  attractive  in  our  case  because  the  Ts  tend  to  be  unimodal 
and  roughly  normal  making  the  Gaussian  proposal  we  are  using  efficient.  Empirical 
experience  has  shown  that  this  algorithm  is  readily  automated  and  requires  far  fewer 
likelihood  evaluations  than  the  previously  mentioned  techniques  resulting  in  substantially 
shorter  run  times. 

To  initiate  independent  parallel  replication  of  the  sampler  we  perturb  the  MLE  (r^, 
f^,  Ko)  obtaining  a  total  of  mj  starting  values  (we  use  mj  =  40).  After  proceeding  for 
kj  iterations  with  these  mj  replications,  by  resampling  we  increase  mj  to  mj  replications 
(we  use  m2=200).  We  proceed  for  kj  further  iterations  with  the  m2  replications, 
monitoring  the  stability  of  selected  quantiles  to  judge  when  convergence  may  be  assumed. 
Finally  we  increase  mj  to  m3  replications  (we  use  m3  =  500).  The  sampler  is  then  run  for 
k3  further  iterations  before  termination.  See  Gelfand  and  Smith,  (1990)  and  Gelfand  et  al. 
(1990)  for  further  discussion  of  such  sampling  schedules. 

In  concluding  this  section  we  mention  an  alternative  approach  for  sampling  the  ^’s 
and  ^8  based  upon  an  idea  in  Jones  (1987).  Jones  observes  that,  for  instance,  ^  may  be 
drawn  uniformly  over  Cp  by  drawing  r  according  to  a  product  Beta  distribution  over  the 
hypercube  |rj  |  <1,  j=l,2,‘  •  •,p.  Using  ideas  in  Smith  and  Gelfand  (1992),  for  given  p  and 
a  ,  such  ^8  may  then  be  resampled  to  essentially  have  the  distribution  f(^|/x,  a  ,  z). 
However,  this  resampling  would  have  to  be  done  for  each  iteration  vnthin  each  replication 
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of  the  Gibbs  sampler  (i.e.  for  each  new  /i,  pair).  We  believe  that  this  method  would  be 
a  very  inefficient  competitor  to  our  sampling  scheme. 

6.  Missing  Values  and  Outliers 

In  this  section,  we  propose  the  use  of  the  sampling  based  approach  to  handle  missing 
values  and  to  detect  outliers  in  an  ARMA(p,q)  process.  This  is  done  by  treating  such  a 
value,  say  z^.,  as  an  unknown  and  adding  it  as  a  parameter  to  the  Gibbs  sampler.  The 
complete  conditional  distribution  for  z^,  f(Zj.|*_^,  ^),  is  proportional  to  the  right  side  of 

(6)  and  in  fact  is  a  normal  distribution  with  mean  and  variance  having  forms  similar  to 
those  for  the  y^^^.  Alternatively,  a  convenient  Gaussian  proposal  is  the  conditional 

distribution  f(Zj.  |zj  ,♦  •  iu  (3).  A  natural  candidate  for  a  starting  value  zj,®  is  the 

mean  of  the  available  z.  or  perhaps  an  interpolated  value  using  adjacent  observed  z’s.  We 
perturb  to  obtain  m^  starting  values  for  the  mj  initiations  of  the  sampler.  An 
iteration  of  the  Gibbs  sampler  then  proceeds  as  discussed  in  Section  5  generating,  in 
addition,  a  z^.  It  is  straightforward  to  extend  this  procedure  to  the  situation  with  several 
missing  values  or  outliers. 

7.  Prediction 

In  a  Bayesian  analysis  of  ARM  A  models,  prediction  proceeds  via  the  predictive 

density 

f(zF  I  *)  =  /f(zF  I «,  I  *  )  d^  (10) 

where  f(*p|^,  z)  is  the  density  of  the  future  data  *p.  If  ip  =  z^^^)  then 

f(zp|*,  r)  =  f(z^^j f(z„,2  |z„,,,  «.  if)  •  •  •  *> 

of  (10)  contrasts  sharply  with  the  non-Bayesian  practice  in  time  series  analysis  of  basing 
forecasts  on  a  particular  set  of  estimated  parameter  values,  i.e.,  on  f(*plz,  if).  It  is  well 
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known  (see  e.g.  Carlin  and  Gelfand  (1990))  that  although  point  estimates  arising  from 
f(zp|z,  ^)  perform  well,  interval  estimates  will  generally  be  too  short,  the  variance  of  this 
distribution  too  small.  In  fact  there  is  recent  literature  on  correcting  for  this  effect  by 
incorporating  the  variability  due  to  parameter  estimation  into  the  prediction  mean  square 
error  (Stine,  1987). 

In  studying  (10)  the  sampling  based  approach  again  fits  nicely.  Using  the  output  of 
the  Gibbs  Sampler,  ^  ,  j=l,  -  •  •  ,m,  an  approximation  to  the  density  itself  may  be  obtained 
as  the  Monte  Carlo  integration 

=  5  (11) 

To  obtain  a  sample  of  predictions  from  the  density  (10),  for  each  we  draw  Zp  .  from 


8.  Model  determination 

Model  determination  involves  two  aspects  —  model  choice  (selection  amongst 
models)  and  model  adequacy  (performance  of  a  particular  model).  Bayesian  assessment  of 
model  adequacy  resides  in  predictive  distributions  by  which  comparison  is  made  between 
what  the  model  predicts  and  what  was  observed.  (In  fact  most  any  model  evaluation 
scheme  relies  on  such  comparison). 

For  pairwise  choice  between  models  the  formal  Bayes  criterion  requires  calculation 
of  the  Bayes  factor  (ratio  of  predictive  distributions)  adjusted  by  a  weight  which  can  be 
regarded  as  the  prior  odds  associated  with  the  models.  Poskitt  and  Tremayne  (1983)  nicely 
unify  such  Bayesian  model  selection  for  time  series  models  in  showing  that  various 
established  criteria,  e.g.,  AIC,  BIG,  <f>^  and  S  may  be  viewed  as  approximate  Bayes  factors 
adjusted  by  weights  reflecting  sample  size  and  model  dimension.  Unfortunately,  with 
improper  priors  on  the  parameters  (as  in  our  case)  predictive  distributions  become 
improper  so  that  interpretation  of  the  Bayes  factor  is  unclear.  Possible  remedies  involve 
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utilizing  the  predictive  distribution  in  a  different  form.  See  Gelfand,  Dey  and  Chang 
(1992)  for  a  recent  discussion  as  well  as  for  ad  hoc  criteria  compatible  with  the 
sampling— based  approach  we  have  adopted  here. 

Perhaps  more  importantly,  we  would  prefer  that  model  selection  not  be  based  upon 
a  single  number.  Rather,  we  would  prefer  to  compare  the  predictive  performance  of 
competing  models  at  each  time  point.  Hence  we  propose  to  look  at  the  predictive 
distribution  for  the  entire  series  given  what  we  have  observed.  More  specifically  if  the 
observed  series  is  =  (z^  •  •  • ,  z^  and  we  wish  to  predict  a  replication  of  the 

series  z  =  (zj ,  •  *  • ,  )  given  (and  a  particular  ARMA  model)  then  the  predictive 

distribution  of  z  given  z^^^  is,  analogous  to  (10), 

To  draw  samples  from  f(z|z^^^)  we  need  only  draw  ^  from  ir(n^lz^j^^)  and  then,  given 

draw  z  from  f(z|^).  The  output  of  our  modified  Gibbs  sampler  provides  a  sample 
j=l,  •  •  • ,  m  from  the  posterior.  Given  Zj  can  be  drawn  sequentially  using  (3). 

Comparison  of  the  sampled  Zj ’s,  j=l,  •  •  •,  m  with  z^^^^  can  be  done  in  many  ways. 

We  view  such  comparison  in  a  diagnostic  fashion  eschewing  formal  inference.  In  particular 
an  appealing  graphical  display,  which  is  used  for  the  examples  of  Section  9,  may  be  created 
as  follows.  If  Zj  ~  f(z|z^j^g)  then  the  t***  component  of  Zj ,  Zj  ^  ~  f(Zj  |z^^^)  the  predictive 

distribution  at  the  t^^  time  point.  Suppose  we  use  the  sample  {z^  j=l,  •  •  • ,  m}  to  obtain 

I  I  ‘o  J  P'®*  I  %b.  ■  I  ‘obs)  I  ™  '' 

This  display  reveals  model  adequacy  via  a  point  doud  close  to  the  origin,  i.e., 
predictive  distributions  have  small  dispersion  and  the  observations  are  consonant  with 
these  distributions.  Furthermore,  all  points  lying  bdow  the  line  y=2x  on  this  plot  are  such 
that  the  observed  y  roughly  falls  within  a  95%  predictive  interval.  Outlying  observations 
will  lie  well  above  this  line.  Extending  these  ideas,  the  display  becomes  an  informal  modd 
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choice  plot.  In  particular,  suppose  we  are  comparing  nested  models.  As  we  run  through  a 
portfolio  of  nested  models,  with  increasing  dimensionality,  we  expect  to  pass  from  say  a 
poor  model  to  a  satisfactory  parsimonious  choice  to  an  overfitted  one.  This  will  be 
manifested  in  the  display  as  follows.  The  poor  model  will  perform  badly  on  both  the  x  and 
y  scales,  the  satisfactory  parsimonious  model  will  perform  well  on  both  scales,  the 
overfitted  model  will  perform  well  on  the  y  scale  but  will  yield  less  concentrated  predictive 
distributions  than  the  parsimonious  one  and  hence  do  worse  on  the  x  scale.  The  same 
principles  should  guide  choice  between  two  nonnested  models,  e.g.,  ARMA  (3,5)  vs.  ARMA 
(4,2)  —  the  one  with  the  point  cloud  closer  to  the  origin  is  preferred.  When  the  two  clouds 
overlap  considerably  choice  between  models  is  unclear.  This  recognition  seems  preferable 
to  a  decision  based  upon  a  single  number.  Returning  to  nested  models,  an  alternative 
informal  model  selection  approach  looks  at  the  posterior  distribution  of  the  "discrepancy" 
parameters,  i.e.,  the  parameters  in  the  full  model  which  are  not  in  the  reduced  model.  For 
instance,  in  comparing  an  ARMA  (3,3)  with  an  ARMA  (2,3),  we  would  examine  the 
posterior  distribution  of  ^3  by  using  the  generated  ,  j=l,  •  •  •,  m  to  see  where  0  falls. 
Again,  we  illustrate  the  use  of  these  model  determination  tools  in  conjunction  with  the 
examples  of  the  next  section. 

9.  Blnstrative  Examples 

We  present  two  examples  to  illustrate  our  methodology.  Example  1  consists  of  the 
quarterly  seasonally  adjusted  U.S.  unemployment  rate  between  1948—1972  (Fuller,  1976),  a 
series  of  n=100  observations.  We  model  this  data  by  autoregressive  processes.  AR(p) 
models,  l<p<5  have  been  fitted  by  Shumway  (1988),  and  the  AR(2)  model  was  selected  as  a 
best  parsimonious  choice  since  the  maximum  likelihood  estimates  of  ^3,  and  <f>^  are  not 
significantly  different  firom  0.  Using  the  likelihood  in  (4)  we  present  here  the  results  from 
an  exact  Bayesian  analysis.  In  Table  1  we  fit  the  AR(1)  model  based  on  the  first  96 
observations,  the  last  4  observations  being  held  out  for  forecast  evaluation.  The  maximum 
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likelihood  estimates  for  the  parameters,  from  which  the  Gibbs  sampler  is  started,  are 
obtained  by  the  Ansley  algorithm.  Table  1  presents  the  MLE’s  together  with  their 
standard  errors.  Also  given  are  the  mean  and  standard  derivation  as  well  as  selected 
percentiles  of  the  marginal  posterior  distributions  for  the  parameters  obtained  from  the 
Gibbs  sampler.  We  note  that  the  values  at  the  50^**  percentile  for  the  (p^ ,  n  and 
correspond  closely  to  the  MLE’s.  The  extreme  percentiles  enable  equal  tail  Bayesian 
interval  estimates. 

Similarly,  Tables  2-5  present  the  results  of  fitting  the  AR(2),  AR(3),  AR(4)  and 
AR(5)  models  respectively.  Interval  estimates  for  the  parameters  ^3,  (p^  and  <p^ 
comfortably  contain  0,  enabling  selection  of  the  AR(2)  model.  For  this  model  we  also 
present  in  Table  2,  results  from  assuming  0,  1,  5  and  20  percent  of  the  n=96  observations 
as  missing.  As  in  Section  6,  we  treat  the  missing  observations  as  parameters  along  with 
1  Ml  .  Inference  seemvS  to  be  little  affected  even  with  as  much  as  20%  missing  data. 
Table  6  presents  the  forecasts  for  the  data  modeled  by  AR(2)  model,  as  described  in 
Section  7,  for  t=97,  98,  •  •  • ,  100.  Figure  1  presents  a  display  of  the  type  described  in 
Section  8  showing  the  AR(1),  AR(2)  and  AR(5)  models.  Clearly  the  AR(1)  model  is  poor 
with  the  AR(2)  and  AR(5)  models  quite  similar. 

The  data  for  the  second  example  are  the  logarithms  (base  10)  of  the  Canadian  lynx 
trap  counts  over  a  114  year  period  (1821—1934).  This  series  has  been  modeled  in  the 
literature  by,  e.g.,  Priestley  (1981)  and  Tong  (1977).  In  the  class  of  ARMA  models,  the 
AR(2),  AR(ll)  and  ARMA  (3,3)  models  have  been  discussed.  Note  that  there  is  no 
nesting  between  the  latter  two.  The  fits  for  the  AR(2)  and  AR(ll)  were  obtained 
analogously  to  those  in  the  first  example  and  are  presented  in  Tables  7  and  8  respectively. 
The  ARMA  (3,3)  modd  was  fit  using  the  likelihood  in  (6)  with  results  given  in  Table  9. 
Finally  Figure  2  presents  a  display  of  the  type  discussed  in  Section  8.  The  AR(ll)  appears 
preferable  to  the  AR(2)  but  the  AR(3,3)  seems  the  best  of  the  three. 
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Table  1:  Parameter  Estimates  for  U.S.  Unemployment  Data  [AR(1)  Model] 

Posterior  Features  from  Gibbs  Sampler 
Percentiles 
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Table  2:  Parameter  Estimates  for  U.S.  Unemployment  Data  [AR(2)  Model] 

Posterior  Features  from  Gibbs  Sampler 
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Table  3:  Parameter  Estimates  for  U.S.  Unemployment  Data  [AR(3)  Model] 
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Table  4:  Parameter  Estimates  for  U.S.  Unemployment  Data  [AR(4)  Model] 
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Table  6:  Forecasts  for  US  Unemployment  Data  [AR(2)  Model] 


Percentiles  from  Gibbs  Sampler 

97.5% 

6.44 

6.44 

6.49 

6.60 

6.74 

6.70 

6.76 

6.89 

6.93 

6.90 

7.00 

7.16 

7.14 

7.20 

7.14 

7.27 

95% 

6.34 

6.35 

6.39 

6.49 

6.55 

6.55 

6.64 

6.74 

6.72 

6.66 

6.74 

6.95 

6.75 

6.68 

6.89 

6.91 

75% 

6.05 

6.02 

6.05 

6.11 

6.06 

5.95 

6.0'2 

6.08 

5.96 

5.88 

5.91 

5.97 

5.80 

5.73 

5.82 

5.85 

50% 

5.83 

5.81 

5.82 

5.86 

5.61 

5.56 

5.57 

5.63 

5.41 

5.25 

5.28 

5.39 

5.09 

4.99 

5.02 

5.18 

25% 

5.61 

5.59 

5.60 

5.54 

5.21 

5.13 

5.16 

5.15 

4.81 

4.73 

4.73 

4.79 

4.43 

4.26 

4.39 

4.47 

5% 

5.31 

5.27 

5.25 

5.11 

4.57 

4.63 

4.54 

4.37 

I  3.95 

1 

1 

3.93 

3.87 

3.78 

3.42 

3.35 

3.42 

3.38 

2.5% 

5.13 

5.15 

5.14 

5.04 

4.37 

4.36 

4.34 

4.03 

3.73 

3.66 

3.64 

3.54 

3.09 

3.16 

2.99 

3.02 

Based 
on  MLE 
(s.e) 

oo  e*5 

Irt  o 

1C 

U5  O 

00  QO 

GO 

IC  O 

<=>  o 
iri 

Actual 

5.83 

5.77 

5.53 

5.30 

Percentage 
of  Missing 
Observations 

O  1-1  «  ^ 

0^-5° 

O  t-<  IC  g 

O  IC  ^ 

Lead 

es 

CO 

»o 

CO 

CO 

.26 

■4< 

m 

.02 

09 

CO 

CO 

CO 

26 

1 

05 

CO 

09 

1 

• 

1 

. 

• 

• 

• 

r 

a> 

ss 

00 

09 

09 

t- 

(O 

o 

(O 

CO 

lO 

CO 

t-H 

es 

o 

CO 

M 

M 

CO 

cs 

lO 

r 

r 

r 

r 

O) 

1-4 

'■ 

es 

00 

09 

N 

CO 

CO 

C3 

1^ 

09 

00 

g 

cs 

CO 

T~l 

o 

O 

CVJ 

p 

cs 

1 

kO 

l' 

1 

r 

r 

C/3 

CO 

! 

■ 

o 

1 

1 

S? 

CO 

00 

C^l 

CO 

o 

fH 

lO 

1  €^ 

o 

o 

P 

CO 

C 

o 

**4 

l' 

* 

r 

* 

|‘ 

* 

* 

r 

O 

to 

03 

»o 

j 

Vm 

CO 

*43 

2 

03 

s> 

<■3 

•a 

g? 

00 

00 

'(f 

es 

<o 

o 

cs 

o 

(9 

V 

o! 

o 

«o 

CO 

o 

c< 

o 

1-^ 

p 

rr 

&4 

»o 

cs 

1-H 

r 

r 

r 

r 

r 

r 

Uj 

O 

c 

a; 

w 

V9 

o 

o 

m 

o 

CO 

o 

«o 

09 

kO 

CO 

o 

00 

04 

5% 

o 

o 

kO 

CO 

cs 

^4 

p 

*— t 

\ 

i 

o 

r 

r 

r 

r 

r 

r 

o 

r 

1 

S? 

t- 

o 

00 

cs 

CO 

o 

oo 

■«!»< 

o 

09 

00 

o 

»o 

•«r 

es 

CO 

o 

kC 

UO 

r 

r 

l' 

r 

r 

r 

r 

r 

r 

r 

es 

1 

Hf 

Ri 

■■ 

■■ 

f 

•4 

m  09 

09'^ 

CO^  ! 

i 

CO 

c 

3  t3 

l-H  O 

es 

o  <-; 

1  ^  -~* 

I 

S'S.  1 

■ 

■ 

m 

■ 

HI 

1 

IH 

H 

I-? 

S>=' 

o> 

U5'^ 

(O  >c 

O*^ 

05 

w  o 

1 

“3  1 

g 

p  ^ 

i 

Ci  1-^ 

CO  o 

IH 

03 

•¥» 

03 

-N 

CO 

lA 

h- 

00 

o 

*■ 

j 

2 

-e- 

0^ 

Posterior  Features  from  Gibbs  Sampler 


"S 

0 

0 


i 


I 

00 


I 


a; 


k< 

4) 

Pu. 


»o 


O) 


feS 

iO 

O) 


1C 


SS 

o 

1C 


g? 

1C 

es 


1C 


S? 

1C 

es 


0^ 


0 


13 

Ck 


c 

s 

u 


a 

-ts 

rd 

Q 

X 

c 

>. 

G 

rt  ” 
C  C*3 

6^ 

lu"^ 

^  II 

■*J  * 


Ci.<; 

0)  II 
.ii  < 
o  _ 

J3^ 
o  cs 

pci 

o  II 


(U 

>-( 

9 

bO 


<<PQm  Su^cz  q»> 


UNCLASSIFIED 


ECU^ITY  CLASliriCATION  OF  This  ^AGC  rtFhM  0«a  satWMI 


REPORT  COCUMENTATiON  PAGE 


report  numscr 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


2.  covr  AccessiON  noj  2.  rccipient’s  catalos  MuMSEfi 


«.  TITUC  Tand 

Bayesian  Analysis  of  ARMA  Processes:  Complete 
Sampling  Based  Inference  Under  Full  Likelihoods 


7.  AuTMORfa; 

John  Marriott,  N.  Ravishanker,  &  A.E.  Gelfand 


S.  TYPE  OF  NCPORT  A  PCRIOO  COVCNCO 

Technical 


•  .  PKRF0RMIN6  ONO.  MCPORT  NUMBCR 

No.  471 


1.  CONTRACT  OR  CRAMT  NUMOKRfaj 

N0025-92-J-1264 


t.  performino  organization  name  ano  aoorcss 

Department  of  Statistics 

Stanford  University 
Stanford,  CA  94305-4065 


II.  CONTROLLING  OFFIce  NAME  ANO  ADDRESS 

Office  of  Naval  Research 
Statistics  &  Probability  Program 
Code  11" 


10.  PROGRAM  ELEMENT.  PROJECT,  task 
AREA  A  WORK  UNIT  NUMBERS 


NR-042-267 


12.  REPORT  OAT! 

June  24,  1993 

;  12.  ....iBER  OF  PAGES 

i  22 


IS.  DISTRIBUTION  STATEMENT  (o!  liila  Rapwu 

.Approved  for  public  release;  distribution  unlimixed. 


CamtnilMA  Offlcaj 

IS.  security  class.  (•!  <Aia  rapaatj 

Unclassified 

ISa.  OECLASSIFICATION/OOWNORAOING 

SCHEDULE  1 

17.  DISTRIBUTION  STATEMENT  (ot  i/ia  aoairao  aaiarad  tn  Blaca  20,  II  d«//aranl  Irmm  Raparfj 


THE  VIEW,  OPINIONS,  ANO/OR  FINDINGS  CONTAINED  IN  THIS  REPORT 
ARE  THOSE  OF  THE  AUTHORJS)  AND  SHOULD  NOT  DE  CONSTRUED  AS 
AN  OFFICIAL  DEPARTMENT  OF  THE  ARMY  POSITIO::,  POLICY,  CR  DE¬ 
CISION,  UNLESS  SO  DESIGNATED  BY  OTHER  DOCUf/.ENTATIO.N, 


IS.  KEY  WORDS  TCanitawa  an  ravaraa  a'da  «  naaaaaaar  And  Idantllr  Aw  MpaA  i 


Ktyviords:  Gibbs  sampler,  mvertibilUy,  latent  variables,  Metropolis  aiyorithm,  missing 
values,  outliers,  prediction,  stationoTiiy. 


20.  AISTMACT  fCMilmM  an  i 


See  Reverse  Side 


DO  I  J  AN*?!  1473  COITION  OF  I  NOV  BS  IB  OBSOLCTC 
S/N  OI02-OI4-«AOt  I 


UNCLASSIFIED _ . 

security  eUABBIFICATIOR  OF  TPIS  PAGE  fNan  Pa«a  Baiaiadl 


BAYESIAN  ANALYSIS  OF  ARMA  PROCESSES;  COMPLETE  SAMPLING  BASED 
INFERENCE  UNDER  FULL  LIKELIHOODS 
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Abstract 

For  a  general  stationary  and  invertible  ARMA  (p,q)  process,  we  show  how  to  carry 
out  a  fully  Bayesian  analysis.  Our  approach  is  through  the  use  of  sampling  based  methods 
involving  three  novel  aspects.  First  the  constraints  on  the  parameter  space  arising  from 
the  stationarity  and  invertibility  conditions  are  handled  by  a  convenient  reparametriaation 
to  all  of  Euclidean  (p+q)-space.  Seojnd,  required  sampling  is  facilitated  by  the 
introduction  of  latent  variables  which,  though  increasing  the  dimensionality  of  the 
problem,  greatly  simplifies  the  evaluation  of  the  likelihood.  Third,  the  particular  sampling 
based  approach  used  is  a  Markov  chain  Monte  Carlo  method  which  is  a  hybrid  of  the  Gibbs 
sampler  and  the  Metropolis  algorithm.  We  also  briefly  show  how  straightforwardly  the 
sampling  based  approach  accommodates  missing  observations,  outlier  detection,  prediction 
and  model  determination.  Finally  we  illustrate  the  approach  with  two  examples. 


